perm filename WILMAC[COM,LSP] blob
sn#828929 filedate 1986-11-21 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00005 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 ∂15-Nov-86 1606 willc%tekchips.tek.csnet@RELAY.CS.NET macro proposal
C00037 00003 Sketch of a New Macro Proposal for Common Lisp
C00069 00004 \appendix{A.}{Possible Solutions to the Macro Name Collision Problem}
C00092 00005
C00120 ENDMK
C⊗;
∂15-Nov-86 1606 willc%tekchips.tek.csnet@RELAY.CS.NET macro proposal
Received: from RELAY.CS.NET by SAIL.STANFORD.EDU with TCP; 15 Nov 86 16:05:37 PST
Received: from tektronix by csnet-relay.csnet id bd12568; 15 Nov 86 5:52 EST
Received: by tektronix.TEK (5.31/6.16)
id AA25710; Fri, 14 Nov 86 21:19:35 PST
Received: by tekchips.TEK (5.31/6.16)
id AA11538; Fri, 14 Nov 86 21:19:48 PST
Message-Id: <8611150519.AA11538@tekchips.TEK>
To: rpg@SU-AI.ARPA
Cc: eugene%iuvax.cs.indiana.edu@RELAY.CS.NET, gls@ZARATHUSTRA.THINK.COM,
gjs@MC.LCS.MIT.EDU, jar@MC.LCS.MIT.EDU
Subject: macro proposal
Date: 14 Nov 86 21:19:42 PST (Fri)
From: willc%tekchips.tek.csnet@RELAY.CS.NET
Here's the macro proposal. I'm sending it to several other parties because
I'd like to hear what they think.
I'm not very happy with the proposal. The amount of machinery seems out
of proportion to the size of the problem; Eugene might be able to reassure
me here. I'd prefer to lay the burden on the programmer. I think the
access paths, which I believe are derived from a suggestion made by JAR,
are a good idea.
I wasn't able to implement it in time, and I'm not sure how much I'd learn
from a quick implementation. My main concern is efficiency. I think it
will work fine on small programs, but I'm worried about how it will scale.
That's why I spent so much space on the asymptotic efficiency, and I still
don't know whether it's practical.
The proposal includes hardly any of the ideas in Kohlbecker's thesis. In
particular it does not do automatic renaming of bound variables. I have
nothing against his ideas, and I'd be happy to think about incorporating
them in the future, but I found that free variables gave me enough to worry
about for the time I had. I also found it simpler to think about the
algorithm without trying to make it match up with Eugene's algorithm.
It shouldn't be any more expensive than Eugene's algorithm, which has been
in use at Indiana University, but I doubt that the people at IU have used
it for anything but fairly small programs.
So it should be considered a straw proposal for the purpose of stimulating
discussion. It's not formal enough to be anything else.
----------------------------------------------------------------
Sketch of a New Macro Proposal for Common Lisp
by Will Clinger
1. Introduction.
This proposal is intended to explain how the Common Lisp macro facility
could be modified to prevent inadvertant capture of free variables
inserted by a macro. The existing macro facility makes no attempt to
solve this problem. As a result, it is believed that all currently
available implementations of Common Lisp contain latent bugs in their
standard macros.
The probability that a user program would run afoul of one of these latent
bugs would increase if the function and value environments were merged.
The factor by which the danger would increase is (V + F) / F, where V is
the number of variables bound by LAMBDA, LET, DO, et cetera, and F is the
number of variables bound by FLET and LABELS. Some people feel that the
danger is negligible in Common Lisp as now defined, but that (V + F) / F
times the current danger is non-negligible.
2. The Problem.
Suppose (setf (car x) y) expands into
(let ((G1 x) (G2 y)) (rplaca G1 G2) G2)
Then
(flet ((rplaca (x) (setq a x)))
...
(setf (car x) y)
...)
will not work properly. The fault lies in the fact that the SETF macro
blithely inserts the free variable RPLACA without considering the
possibility that RPLACA may be bound locally.
(A similar problem occurs if (the function value of) RPLACA has been
assigned using (SETF (SYMBOL-FUNCTION 'RPLACA) ...). This problem
could be ameliorated by adding to Common Lisp a mechanism for declaring
function variables to be constant much as ordinary variables can be
declared constant using DEFCONSTANT. This problem will be addressed
later.)
If the setf macro were instead defined so that (setf (car x) y) expanded
into
(let ((G1 x) (G2 y))
(funcall (quote #<PROCEDURE RPLACA>)
G1
G2)
G2)
where #<PROCEDURE RPLACA> is the function value yielded by evaluating
(FUNCTION RPLACA) in a virgin Common Lisp, then the SETF macro would
work correctly.
(Gabriel and Pitman point out some of the difficulties that attend quoted
procedure values. This proposal will go on to suggest another mechanism
that avoids free variables, but quoting is simpler.)
The main reason that people do not write correct macros is that correct
macros are harder to write than incorrect macros:
(defsetf car (x) (y)
`(progn (rplaca ,x ,y)
,y))
is easier to write than
(defsetf car (x) (y)
`(progn (funcall ',#'rplaca ,x ,y)
,y))
or even
(defsetf car (x) y)
`(progn (',rplaca ,x ,y)
,y))
where the last of the three examples simply removes the unnecessary
complexity caused by the separate function and value environments in
Common Lisp.
2.1 How Bad is the Problem?
Many of the standard Common Lisp macros must insert free variables.
The SETF macro happens to be particularly dangerous because there are many
different free variables that it is capable of inserting into a program,
and most of those variables are not documented in Common Lisp, the
Language. The problem is not unique to SETF, however. All Common Lisp
macros that insert free variables have potential for such mischief.
The Common Lisp culture encourages the use of macros, so many user-defined
macros have been and will be written with the same problem.
How does Scheme deal with this problem? Officially Scheme doesn't have
macros at all because no one has ever designed a sufficiently good macro
facility, but to drop all pretense: Scheme has many fewer standard macros.
The only standard Scheme macros that might insert free variables are the
CASE, QUASIQUOTE, and DELAY macros. The Scheme culture discourages the
use of macros, and encourages people who write macros to use quoted values
or similar mechanisms instead of free variables.
It must be said that the problem of free variables inserted by macros is
hardly the most serious problem with Common Lisp.
3. Possible Solutions.
3.1 Pretend the Problem Doesn't Exist
This is the traditional solution. It worked fine so long as no one
pretended that Lisp was an industrial strength language.
3.2 Make the Programmer Do It
The next edition of Common Lisp, the Language could point out the problem
and urge that programmers avoid it by using quoted values in place of
free variables. Failing that, macro writers could be encouraged to
initialize an uninterned symbol or symbol from an obscure package and
to use such symbols as the free variables.
The main problem with this is that the macro writer has to think.
3.3 Try to Make the Problem Go Away
The problem would go away for variables in function contexts if FLET
and LABELS were removed from Common Lisp, and the function environment
stayed separate from the value environment.
The problem would remain for variables in value contexts, but such
variables are probably less common than free variables in function
contexts. It might therefore be reasonable to make the programmer
take care of variables in value contexts. This approach is something
of a middle ground between ignoring the problem and making the macro
writer worry about it.
3.4 Make the System Do It
The semantics of Common Lisp macros could be changed so that if X is a
free variable inserted by a macro, then for all free references to X
throughout the expanded macro:
1. (... X ...) is replaced by (... (QUOTE VALUE) ...)
2. (X ...) is replaced by (FUNCALL (QUOTE FVALUE) ...)
where VALUE is the value value of X at macro expansion time and FVALUE
is the function value of X at macro expansion time.
3.4.1 A Simplified Proposal Suitable for Scheme.
The issues that surround macros are quite complex in their own right.
In order to consider these issues in the simplest possible context,
the remainder of this section will use Scheme instead of Common Lisp.
Since there is only one lexical environment in Scheme, the simplified
solution is:
If X is a free variable inserted by a macro, then all free references
to X throughout the expanded macro are replaced by (QUOTE VALUE), where
VALUE is the value of X at macro expansion time.
3.4.2 An Algorithm.
The primary purpose of this algorithm is to provide a more concrete
meaning for the phrase "free variable inserted by a macro", and to
provide a framework for discussion of the remaining issues.
The algorithm is presented in terms of a hypothetical MACRO-EXPAND
procedure for Scheme, corresponding to the MACROEXPAND function in
Common Lisp. A companion MACRO-EXPAND1 procedure will be assumed,
corresponding to the MACROEXPAND1 function in Common Lisp.
For simplicity, the algorithm is presented as though all macros are
global. The changes that need to be made for local macros are discussed
later.
If V is a variable and M and N are expressions, then the notation M[N / V]
denotes the expression obtained by substituting N for all free references
to V in M. Free occurrences of V on the left hand side of an assignment
are *not* substituted; my use of the notation is therefore nonstandard.
The notation M[N1 / V1][N2 / V2]...[Nk / Vk] indicates cascaded
substitutions from left to right.
MACRO-EXPAND takes an expression as its argument and returns two
values: a completely macro-expanded expression X and a set F of variables
free in X. (In general F will be a subset of the variables free in X, but
it will usually contain most of the free variables.) MACRO-EXPAND works
by case analysis on its argument. In Scheme there are three broad cases:
Case 1. The argument is a lambda expression. (Lambda expressions are
the only primitive binding constructs in Scheme. All other binding
constructs are macros that expand into lambda expressions.) MACRO-EXPAND
returns (1) the result of substituting for each subexpression the result
of calling itself recursively on the subexpression; (2) the union of the
sets of free variables computed for each subexpression, minus the set of
variables bound by the lambda expression.
Case 2. The argument is neither a lambda expression nor a macro call.
MACRO-EXPAND returns (1) the result of substituting for each subexpression
the result of calling itself recursively on the subexpression; (2) the
union of the sets of free variables computed for each subexpression.
Case 3. The argument is a macro call M. (This is the interesting case.)
Let A be the set of symbols that appear in M. Let N be the set of names
of all special forms and macros that are in effect for M. (Since we are
for the moment assuming that all macros are global, N is the set of names
of all special forms and macros that have been defined.) Let X be the
completely expanded macro obtained by calling MACRO-EXPAND recursively
on the result of (MACRO-EXPAND1 X). Let F be the set of free variables
returned by this recursive call to MACRO-EXPAND. Let {V1, ..., Vn} =
F - A - N. Then MACRO-EXPAND returns
(1) X[(QUOTE VALUE1) / V1]...[(QUOTE VALUEn) / Vn] where
VALUE1, ..., VALUEn are the values of V1, ..., Vn at macro
expansion time;
(2) the empty set.
4. Issues and Extensions
4.1 Efficiency
Macro expansion will take longer if the proposal is adopted. Currently
macro expansion normally takes O(n) time, where n is the size of the
program. The proposed algorithm normally takes O(n log n) time, and
is O(n↑2) in the worst case.
The size of the program is a vague concept, because it must take into
account not only the size of the original program and the size of the
fully macro-expanded program but also the size of any intermediate
stages. I have assumed that macro expansion never decreases the size
of the program, which allows me to take the size of the program to be
the size of the fully macro-expanded program.
4.1.1 Derivation of O(n log n)
The asymptotic run time of the algorithm is determined by the time
required to compute the set A of symbols that appear in a macro call
and by the time required to perform the substitution for V1, ..., Vn.
The set A must in general be re-computed for each macro call because
macros can perform arbitrary transformations of the expression tree.
If we assume that the probability that a node in the expression tree
represents a macro call is reasonably independent of its position in
the tree, then we might as well assume that all nodes in the tree are
macro calls (because that does not change the asymptotic run time).
We can also compute an upper bound on the time by assuming that macro
calls do not change the size of the tree. (The run time would be less
if macro calls increased the size of the tree because that would mean
that the intermediate stages were smaller than the program size we are
assuming.)
The time required to compute the set of symbols that appear in a tree
is proportional to the number of nodes in the tree. So is the time
required to substitute for free variables. Assuming that each node
in the tree is a macro call, the run time of the algorithm is
proportional to the sum of the number of nodes in the expression tree
and all its subtrees, where each subtree is taken to be a distinct
tree. Call this sum S.
For linear expression trees, S is equal to (h * (h + 1)) / 2, where h
is the height of the tree. Since n is equal to h in this case, the run
time is proportional to n↑2. An example of a program on which the
algorithm runs in O(n↑2) time:
(defun add1000 (n)
(succ
(succ
(succ
(succ
...
(succ
(succ
(succ
(succ
(succ n))))) ... )))))
where (SUCC X) is a macro call that expands into (1+ X).
I believe this is a pathological case, since the width of a programmer's
display places a practical bound on the height of the expression tree.
For binary expression trees, S is equal to (h - 1) * (2↑h) + 1, where h
is the height of the tree. Since n equals 2↑n - 1 in this case, the
run time is proportional to n log←2 n.
The flatter the expression tree, the lower the run time in proportion
to the size, but it remains O(n log n).
4.1.2 What O(n log n) Means in Practice
Macro expansion might take several times as long as it does now.
4.1.3 Ways to Improve Efficiency
It is possible that most macros do not insert free variables. (This
is certainly true in Scheme.) For such macros the proposal is as
efficient as macro expansion is now.
The expression trees for most macro calls contain substantial subtrees
that will be preserved intact when the macro call is expanded. If the
algorithm is modified so that a hash table is used to remember the sets
of symbols that occur in the subtrees, then they will not have to be
traversed a second time to determine the set of symbols that appear in
them.
4.1.4 Implications for Pure Interpreters
Pure interpreters that expand macros repeatedly at run time will suffer
from macro expansion even more than they do now.
4.1.5 Implications for Compilers
My guess is that the effect of slower macro expansion may be noticeable
in fast compilers but not in heavy-duty optimizing compilers.
4.2 Some Variables Should be Free
A macro that expands into an expression containing a free reference to
*PRINT-BASE* probably wants to use the run-time value of the variable,
not the macro-expansion time value. A syntax or declaration must be
available so that macro writers can tell the system the names of variables
that the macro writer intended for the macro to insert as new free
variables.
The macro expansion algorithm can be changed so that the macro expander
does not substitute for special (dynamically scoped) variables. This
may make the need to override the default less common, but some sort
of override will still be needed.
4.2.1 The System Can't Read the Programmer's Mind
There are three possible times at which the value of a free variable
inserted by a macro call could be obtained: macro definition time,
macro expansion time, and run time. Only one of them can be the default;
macros that need to obtain the value at another time will require the
programmer to give explicit instructions. In Common Lisp as now defined
the default is run time. This proposal changes the default to macro
expansion time.
It seems that the best default would be to use the value of X at macro
definition time, but at macro definition time it is impossible to determine
which variables might be inserted by the macro. Consider
(defmacro menu ()
`(list ,(todays-special) ,@(usual-entrees) ,@(beverages)))
This problem can only be solved by adopting a significantly more
restrictive macro definition language than is currently used in Common
Lisp.
4.2.2 Assignments are Still a Problem
There is no perfectly reliable way, either in Common Lisp as now defined
or using the proposal outlined here, to write a macro that refers to the
run-time value of a free variable or assigns to such a variable. The
problem is that the macro must insert a free occurrence of the variable,
either as a reference or in an assignment, but has no way to protect
itself from being used in a context where the variable is bound locally
for some irrelevant purpose.
The best solution to this problem seems to be some kind of absolute
path to the variable. If Scheme-style environments were supported,
then it would be sufficient to have a single global variable containing
a root environment that everyone agrees not to bind. (The language
could enforce this, just as (I presume) Common Lisp prevents T and NIL
from being bound.) Then an ACCESS special form as used in MIT Scheme
could be used within the expanded macro to reach the variable:
(ACCESS FOO *ROOT-ENVIRONMENT*)
(ACCESS FOO
(ACCESS MY-ENV
(ACCESS PROJECT-ENV
(ACCESS ORGANIZATION-ENV *ROOT-ENVIRONMENT*))))
If environments were supported in this manner, then it might be better
for the macro expander to substitute
(ACCESS V (ACCESS MACRO-ENV *ROOT-ENVIRONMENT*))
for free occurrences of V than to substitute (QUOTE VALUE), where VALUE
is the value of V. This would mean that the default semantics would be
to use the run-time value of the variable, as in current Common Lisp,
while solving the problem of free references. If ACCESS expressions
were permitted on the left hand side of assignments, and if the macro
expansion algorithm proposed above were modified to substitute for assigned
variables as well as for referenced variables, then the assignment problem
would be solved as well.
Furthermore if MACRO-ENV were a constant in the *ROOT-ENVIRONMENT*, then
the code generated by the compiler for the access path could be made as
efficient as an ordinary reference or assignment to a global variable.
4.3 Local Macros
Local macros such as those introduced by Common Lisp's MACROLET require
the macro expansion algorithm to take an extra argument containing the
macro environment.
The macro expansion algorithm given above does not substitute for the
names of special forms and macros. This is necessary, for example,
so that (LET (...) ...) can expand into ((LAMBDA (...) ...) ...). This
becomes problematic in the presence of local macros.
What if LAMBDA is bound locally as a macro? Perhaps it should not be
possible to shadow built-in special forms with local macros.
What if LET is bound locally as a macro, so that macros that expand
into a LET won't work? Perhaps it should not be possible to shadow
any special form or macro with a local macro. This will cause
compile-time errors in some cases, but compile-time errors are better
than wrong answers.
If a free references to a variable (Common Lisp: in function context)
is inserted deliberately by a macro, overriding the default substitution,
then the free reference won't work if it appears within the scope of a
local macro of the same name. The use of environments and access paths
seems the best solution to this problem.
5. Summary of Proposal
The whole proposal can be separated into the parts listed below. Each
part can be considered separately, but the parts are interdependent in
the sense that dropping one will require dropping or modifying others.
Part 1 is not really part of the macro proposal but is consistent with
it and would simplify the other parts.
1. Merge function and value environments.
2. Local macros also appear in the merged environment.
3. Special forms and macros cannot be shadowed either by variable
bindings or by local macro definitions.
4. Variables can be shadowed by local macro definitions.
5. Add Scheme-style environments and access paths.
6. Keep a root environment in a variable that can neither be
assigned nor bound.
7. The macro expander automatically substitutes for free variables.
8. The macro expander substitutes access paths for free variables.
9. Special (dynamically bound) variables are immune to the
automatic substitution performed by the macro expander.
10. Programmers can get around the automatic substitution by declaring
variables for which the macro expander should not substitute.
Perhaps not all of these proposals fit the character of Common Lisp, but
it cannot be denied that they add up to a great deal of machinery to
solve a fairly small problem.
Sketch of a New Macro Proposal for Common Lisp
by Will Clinger
1. Introduction.
This proposal is intended to explain how the Common Lisp macro facility
could be modified to prevent inadvertant capture of free variables
inserted by a macro. The existing macro facility makes no attempt to
solve this problem. As a result, it is believed that all currently
available implementations of Common Lisp contain latent bugs in their
standard macros.
The probability that a user program would run afoul of one of these latent
bugs would increase if the function and value environments were merged.
The factor by which the danger would increase is (V + F) / F, where V is
the number of variables bound by LAMBDA, LET, DO, et cetera, and F is the
number of variables bound by FLET and LABELS. Some people feel that the
danger is negligible in Common Lisp as now defined, but that (V + F) / F
times the current danger is non-negligible.
2. The Problem.
Suppose (setf (car x) y) expands into
(let ((G1 x) (G2 y)) (rplaca G1 G2) G2)
Then
(flet ((rplaca (a x) (setq a x)))
...
(setf (car x) y)
...)
will not work properly. The fault lies in the fact that the SETF macro
blithely inserts the free variable RPLACA without considering the
possibility that RPLACA may be bound locally.
(A similar problem occurs if (the function value of) RPLACA has been
assigned using (SETF (SYMBOL-FUNCTION 'RPLACA) ...). This problem
could be ameliorated by adding to Common Lisp a mechanism for declaring
function variables to be constant much as ordinary variables can be
declared constant using DEFCONSTANT. This problem will be addressed
later.)
If the setf macro were instead defined so that (setf (car x) y) expanded
into
(let ((G1 x) (G2 y))
(funcall (quote #<PROCEDURE RPLACA>)
G1
G2)
G2)
where #<PROCEDURE RPLACA> is the function value yielded by evaluating
(FUNCTION RPLACA) in a virgin Common Lisp, then the SETF macro would
work correctly.
(Gabriel and Pitman point out some of the difficulties that attend quoted
procedure values. This proposal will go on to suggest another mechanism
that avoids free variables, but quoting is simpler.)
The main reason that people do not write correct macros is that correct
macros are harder to write than incorrect macros:
(defsetf car (x) (y)
`(progn (rplaca ,x ,y)
,y))
is easier to write than
(defsetf car (x) (y)
`(progn (funcall ',#'rplaca ,x ,y)
,y))
or even
(defsetf car (x) y)
`(progn (',rplaca ,x ,y)
,y))
where the last of the three examples simply removes the unnecessary
complexity caused by the separate function and value environments in
Common Lisp.
2.1 How Bad is the Problem?
Many of the standard Common Lisp macros must insert free variables.
The SETF macro happens to be particularly dangerous because there are many
different free variables that it is capable of inserting into a program,
and most of those variables are not documented in Common Lisp, the
Language. The problem is not unique to SETF, however. All Common Lisp
macros that insert free variables have potential for such mischief.
The Common Lisp culture encourages the use of macros, so many user-defined
macros have been and will be written with the same problem.
How does Scheme deal with this problem? Officially Scheme doesn't have
macros at all because no one has ever designed a sufficiently good macro
facility, but to drop all pretense: Scheme has many fewer standard macros.
The only standard Scheme macros that might insert free variables are the
CASE, QUASIQUOTE, and DELAY macros. The Scheme culture discourages the
use of macros, and encourages people who write macros to use quoted values
or similar mechanisms instead of free variables.
It must be said that the problem of free variables inserted by macros is
hardly the most serious problem with Common Lisp.
3. Possible Solutions.
3.1 Pretend the Problem Doesn't Exist
This is the traditional solution. It worked fine so long as no one
pretended that Lisp was an industrial strength language.
3.2 Make the Programmer Do It
The next edition of Common Lisp, the Language could point out the problem
and urge that programmers avoid it by using quoted values in place of
free variables. Failing that, macro writers could be encouraged to
initialize an uninterned symbol or symbol from an obscure package and
to use such symbols as the free variables.
The main problem with this is that the macro writer has to think.
3.3 Try to Make the Problem Go Away
The problem would go away for variables in function contexts if FLET
and LABELS were removed from Common Lisp, and the function environment
stayed separate from the value environment.
The problem would remain for variables in value contexts, but such
variables are probably less common than free variables in function
contexts. It might therefore be reasonable to make the programmer
take care of variables in value contexts. This approach is something
of a middle ground between ignoring the problem and making the macro
writer worry about it.
3.4 Make the System Do It
The semantics of Common Lisp macros could be changed so that if X is a
free variable inserted by a macro, then for all free references to X
throughout the expanded macro:
1. (... X ...) is replaced by (... (QUOTE VALUE) ...)
2. (X ...) is replaced by (FUNCALL (QUOTE FVALUE) ...)
where VALUE is the value value of X at macro expansion time and FVALUE
is the function value of X at macro expansion time.
3.4.1 A Simplified Proposal Suitable for Scheme.
The issues that surround macros are quite complex in their own right.
In order to consider these issues in the simplest possible context,
the remainder of this section will use Scheme instead of Common Lisp.
Since there is only one lexical environment in Scheme, the simplified
solution is:
If X is a free variable inserted by a macro, then all free references
to X throughout the expanded macro are replaced by (QUOTE VALUE), where
VALUE is the value of X at macro expansion time.
3.4.2 An Algorithm.
The primary purpose of this algorithm is to provide a more concrete
meaning for the phrase "free variable inserted by a macro", and to
provide a framework for discussion of the remaining issues.
The algorithm is presented in terms of a hypothetical MACRO-EXPAND
procedure for Scheme, corresponding to the MACROEXPAND function in
Common Lisp. A companion MACRO-EXPAND1 procedure will be assumed,
corresponding to the MACROEXPAND1 function in Common Lisp.
For simplicity, the algorithm is presented as though all macros are
global. The changes that need to be made for local macros are discussed
later.
If V is a variable and M and N are expressions, then the notation M[N / V]
denotes the expression obtained by substituting N for all free references
to V in M. Free occurrences of V on the left hand side of an assignment
are *not* substituted; my use of the notation is therefore nonstandard.
The notation M[N1 / V1][N2 / V2]...[Nk / Vk] indicates cascaded
substitutions from left to right.
MACRO-EXPAND takes an expression as its argument and returns two
values: a completely macro-expanded expression X and a set F of variables
free in X. (In general F will be a subset of the variables free in X, but
it will usually contain most of the free variables.) MACRO-EXPAND works
by case analysis on its argument. In Scheme there are three broad cases:
Case 1. The argument is a lambda expression. (Lambda expressions are
the only primitive binding constructs in Scheme. All other binding
constructs are macros that expand into lambda expressions.) MACRO-EXPAND
returns (1) the result of substituting for each subexpression the result
of calling itself recursively on the subexpression; (2) the union of the
sets of free variables computed for each subexpression, minus the set of
variables bound by the lambda expression.
Case 2. The argument is neither a lambda expression nor a macro call.
MACRO-EXPAND returns (1) the result of substituting for each subexpression
the result of calling itself recursively on the subexpression; (2) the
union of the sets of free variables computed for each subexpression.
Case 3. The argument is a macro call M. (This is the interesting case.)
Let A be the set of symbols that appear in M. Let N be the set of names
of all special forms and macros that are in effect for M. (Since we are
for the moment assuming that all macros are global, N is the set of names
of all special forms and macros that have been defined.) Let X be the
completely expanded macro obtained by calling MACRO-EXPAND recursively
on the result of (MACRO-EXPAND1 X). Let F be the set of free variables
returned by this recursive call to MACRO-EXPAND. Let {V1, ..., Vn} =
F - A - N. Then MACRO-EXPAND returns
(1) X[(QUOTE VALUE1) / V1]...[(QUOTE VALUEn) / Vn] where
VALUE1, ..., VALUEn are the values of V1, ..., Vn at macro
expansion time;
(2) the empty set.
4. Issues and Extensions
4.1 Efficiency
Macro expansion will take longer if the proposal is adopted. Currently
macro expansion normally takes O(n) time, where n is the size of the
program. The proposed algorithm normally takes O(n log n) time, and
is O(n↑2) in the worst case.
The size of the program is a vague concept, because it must take into
account not only the size of the original program and the size of the
fully macro-expanded program but also the size of any intermediate
stages. I have assumed that macro expansion never decreases the size
of the program, which allows me to take the size of the program to be
the size of the fully macro-expanded program.
4.1.1 Derivation of O(n log n)
The asymptotic run time of the algorithm is determined by the time
required to compute the set A of symbols that appear in a macro call
and by the time required to perform the substitution for V1, ..., Vn.
The set A must in general be re-computed for each macro call because
macros can perform arbitrary transformations of the expression tree.
If we assume that the probability that a node in the expression tree
represents a macro call is reasonably independent of its position in
the tree, then we might as well assume that all nodes in the tree are
macro calls (because that does not change the asymptotic run time).
We can also compute an upper bound on the time by assuming that macro
calls do not change the size of the tree. (The run time would be less
if macro calls increased the size of the tree because that would mean
that the intermediate stages were smaller than the program size we are
assuming.)
The time required to compute the set of symbols that appear in a tree
is proportional to the number of nodes in the tree. So is the time
required to substitute for free variables. Assuming that each node
in the tree is a macro call, the run time of the algorithm is
proportional to the sum of the number of nodes in the expression tree
and all its subtrees, where each subtree is taken to be a distinct
tree. Call this sum S.
For linear expression trees, S is equal to (h * (h + 1)) / 2, where h
is the height of the tree. Since n is equal to h in this case, the run
time is proportional to n↑2. An example of a program on which the
algorithm runs in O(n↑2) time:
(defun add1000 (n)
(succ
(succ
(succ
(succ
...
(succ
(succ
(succ
(succ
(succ n))))) ... )))))
where (SUCC X) is a macro call that expands into (1+ X).
I believe this is a pathological case, since the width of a programmer's
display places a practical bound on the height of the expression tree.
For binary expression trees, S is equal to (h - 1) * (2↑h) + 1, where h
is the height of the tree. Since n equals 2↑n - 1 in this case, the
run time is proportional to n log←2 n.
The flatter the expression tree, the lower the run time in proportion
to the size, but it remains O(n log n).
4.1.2 What O(n log n) Means in Practice
Macro expansion might take several times as long as it does now.
4.1.3 Ways to Improve Efficiency
It is possible that most macros do not insert free variables. (This
is certainly true in Scheme.) For such macros the proposal is as
efficient as macro expansion is now.
The expression trees for most macro calls contain substantial subtrees
that will be preserved intact when the macro call is expanded. If the
algorithm is modified so that a hash table is used to remember the sets
of symbols that occur in the subtrees, then they will not have to be
traversed a second time to determine the set of symbols that appear in
them.
4.1.4 Implications for Pure Interpreters
Pure interpreters that expand macros repeatedly at run time will suffer
from macro expansion even more than they do now.
4.1.5 Implications for Compilers
My guess is that the effect of slower macro expansion may be noticeable
in fast compilers but not in heavy-duty optimizing compilers.
4.2 Some Variables Should be Free
A macro that expands into an expression containing a free reference to
*PRINT-BASE* probably wants to use the run-time value of the variable,
not the macro-expansion time value. A syntax or declaration must be
available so that macro writers can tell the system the names of variables
that the macro writer intended for the macro to insert as new free
variables.
The macro expansion algorithm can be changed so that the macro expander
does not substitute for special (dynamically scoped) variables. This
may make the need to override the default less common, but some sort
of override will still be needed.
4.2.1 The System Can't Read the Programmer's Mind
There are three possible times at which the value of a free variable
inserted by a macro call could be obtained: macro definition time,
macro expansion time, and run time. Only one of them can be the default;
macros that need to obtain the value at another time will require the
programmer to give explicit instructions. In Common Lisp as now defined
the default is run time. This proposal changes the default to macro
expansion time.
It seems that the best default would be to use the value of X at macro
definition time, but at macro definition time it is impossible to determine
which variables might be inserted by the macro. Consider
(defmacro menu ()
`(list ,(todays-special) ,@(usual-entrees) ,@(beverages)))
This problem can only be solved by adopting a significantly more
restrictive macro definition language than is currently used in Common
Lisp.
4.2.2 Assignments are Still a Problem
There is no perfectly reliable way, either in Common Lisp as now defined
or using the proposal outlined here, to write a macro that refers to the
run-time value of a free variable or assigns to such a variable. The
problem is that the macro must insert a free occurrence of the variable,
either as a reference or in an assignment, but has no way to protect
itself from being used in a context where the variable is bound locally
for some irrelevant purpose.
The best solution to this problem seems to be some kind of absolute
path to the variable. If Scheme-style environments were supported,
then it would be sufficient to have a single global variable containing
a root environment that everyone agrees not to bind. (The language
could enforce this, just as (I presume) Common Lisp prevents T and NIL
from being bound.) Then an ACCESS special form as used in MIT Scheme
could be used within the expanded macro to reach the variable:
(ACCESS FOO *ROOT-ENVIRONMENT*)
(ACCESS FOO
(ACCESS MY-ENV
(ACCESS PROJECT-ENV
(ACCESS ORGANIZATION-ENV *ROOT-ENVIRONMENT*))))
If environments were supported in this manner, then it might be better
for the macro expander to substitute
(ACCESS V (ACCESS MACRO-ENV *ROOT-ENVIRONMENT*))
for free occurrences of V than to substitute (QUOTE VALUE), where VALUE
is the value of V. This would mean that the default semantics would be
to use the run-time value of the variable, as in current Common Lisp,
while solving the problem of free references. If ACCESS expressions
were permitted on the left hand side of assignments, and if the macro
expansion algorithm proposed above were modified to substitute for assigned
variables as well as for referenced variables, then the assignment problem
would be solved as well.
Furthermore if MACRO-ENV were a constant in the *ROOT-ENVIRONMENT*, then
the code generated by the compiler for the access path could be made as
efficient as an ordinary reference or assignment to a global variable.
4.3 Local Macros
Local macros such as those introduced by Common Lisp's MACROLET require
the macro expansion algorithm to take an extra argument containing the
macro environment.
The macro expansion algorithm given above does not substitute for the
names of special forms and macros. This is necessary, for example,
so that (LET (...) ...) can expand into ((LAMBDA (...) ...) ...). This
becomes problematic in the presence of local macros.
What if LAMBDA is bound locally as a macro? Perhaps it should not be
possible to shadow built-in special forms with local macros.
What if LET is bound locally as a macro, so that macros that expand
into a LET won't work? Perhaps it should not be possible to shadow
any special form or macro with a local macro. This will cause
compile-time errors in some cases, but compile-time errors are better
than wrong answers.
If a free references to a variable (Common Lisp: in function context)
is inserted deliberately by a macro, overriding the default substitution,
then the free reference won't work if it appears within the scope of a
local macro of the same name. The use of environments and access paths
seems the best solution to this problem.
5. Summary of Proposal
The whole proposal can be separated into the parts listed below. Each
part can be considered separately, but the parts are interdependent in
the sense that dropping one will require dropping or modifying others.
Part 1 is not really part of the macro proposal but is consistent with
it and would simplify the other parts.
1. Merge function and value environments.
2. Local macros also appear in the merged environment.
3. Special forms and macros cannot be shadowed either by variable
bindings or by local macro definitions.
4. Variables can be shadowed by local macro definitions.
5. Add Scheme-style environments and access paths.
6. Keep a root environment in a variable that can neither be
assigned nor bound.
7. The macro expander automatically substitutes for free variables.
8. The macro expander substitutes access paths for free variables.
9. Special (dynamically bound) variables are immune to the
automatic substitution performed by the macro expander.
10. Programmers can get around the automatic substitution by declaring
variables for which the macro expander should not substitute.
Perhaps not all of these proposals fit the character of Common Lisp, but
it cannot be denied that they add up to a great deal of machinery to
solve a fairly small problem.
\appendix{A.}{Possible Solutions to the Macro Name Collision Problem}
With the problems with macros described earlier, one would hope that there
could be some solutions to these problems if Common Lisp were to
become a \lispone/ dialect.
There are four basic approaches to the macro name collision problem.
\numitem{1.}Do nothing.
\numitem{2.}Eliminate the Lisp features that contribute to create the
problem.
\numitem{3.}Devise a macro definition facility and macro expansion
algorithm that enable the programmer to write macros in the current
Common Lisp style
with most of the name collision problems solved in the `obvious' way.
\numitem{4.}Provide linguistic support for the programmer to express
concisely what is meant.
In Common Lisp today, macros are implemented as if the approach to the
macro problem were approach~1. Although all of the problems of macros
explained in this paper exist in \lisptwo/, programmers routinely write
powerful macros frequently. In \lispone/ the potential for stumbling into
an error is greater than in a \lisptwo/, but programmers can learn to work
around the problems relatively easily.
In Common Lisp approach~2 can be implemented by removing
\smcp{FLET}, \smcp{LABELS}, and \smcp{MACROLET}. In \lispone/, approach~2
might require eliminating macros and using functions as the main
abstraction mechanism rather than macros. The compiler can be directed to
inline code the functions that form the basis for abstraction, gaining
speed. The only aspect of macros that are not as easily modelled this way
is the optimization mentioned earlier in the section called {\it The Benefits
of Macros}.
Approach~3 requires attending to the free variables inserted by
macro definitions. If \fix{X} is a free variable inserted by a macro,
then all free references to \fix{X} throughout the expanded macro must be
replaced by \fix{(QUOTE~VALUE)}, where \fix{VALUE} is the value of \fix{X}
at macro expansion time.
Recall the following code for a \lispone/ dialect:
\begintt
(DEFMACRO FIRST (LIST) `(CAR ,LIST))
(DEFMACRO REST (LIST) `(CDR ,LIST))
(DEFUN TEST-CAR (CAR TEST-LIST)
"The `driver' program for Frobozz Automobile, Inc.'s
quality assurance test."
(DO ((TESTS TEST-LIST (REST TESTS)))
((NULL TESTS))
((FIRST TESTS) CAR)))
\endtt
\noindent This code suffers from the insertion of the free variable
\smcp{CAR} into a context in which \smcp{CAR} is bound.
In some Scheme implementations it is possible to write the following
to solve this problem:
\begintt
(DEFMACRO FIRST (LIST) `(',CAR ,LIST))
\endtt
\noindent This is syntactically more tedious but has the obvious advantage
that there is no potential for name confusion. In this macro, the value
of \smcp{CAR} is inserted into the resulting macro expanded form at macro
expansion time. The value is, presumably, the function that implements
\smcp{CAR}. Nevertheless, something like this doesn't instantly answer
all technical problems. When uses of this macro are compiled to a file,
one of two things must be output to the file and later read back in:
\numitem{a.}an object such as \smcp{\#$\langle$COMPILED-FUNCTION
CAR$\rangle$}, which is the name of a compiled code object
\numitem{b.}a representation of the compiled code object for \smcp{CAR}.
Lisp and Scheme implementations that can output only names of compiled
code objects will need to produce named functions along with references to
them in compiled code files for those functions that are incidentally
defined in macro definitions. For example, consider the following, more
complex macro definition and use:
\begintt
(DEFMACRO FOO (N X) `(',(LAMBDA (X) (+ X N)) ,X))
(FOO 3 Z)
\endtt
\noindent The incidental function \fix{(LAMBDA (X) (+ X N))} will need a
name.
For Lisp and Scheme implementations that output representations of code
objects into files, there is an additional problem, which is that for each
use of a macro like \smcp{FOO} there may be a separate copy of the
compiled code for the anonymous function in the macro definition. Also,
in addition to writing the compiled code to a file, all procedures
necessary to support it must also be correctly saved and restored by the
file compiler.
One problem with outputting code objects to a file is that there is a
possibility of having multiple copies of the same code object in the Lisp
image when the file is loaded. Some systems are able to collapse
structurally equal compiled-code objects into only one copy when the
occurrences of them are all in a single file, but this requires some work
and does not solve the problem of the same code object appearing in
multiple files.
When the speed of compiled code is important, programmers will wish their
compiler to open-code invocations of Lisp primitives like \smcp{CAR}.
Usually compilers notice such occurrences of \smcp{CAR} by recognizing
occurrences of the symbol \smcp{CAR}. When macros indicate the invocation
of primitives like \smcp{CAR} and the approach being discussed is in
force, compilers will need to notice occurrences of \smcp{CAR} by
recognizing occurrences of the compiled code object that implements
\smcp{CAR}. For instance, in the example above, the compiler ought to
open-code the original \smcp{CAR} operation in the \smcp{TEST-CAR} code
rather than to code a function call to \smcp{CAR}. To do this, the
compiler must be made to understand a new set of idioms, and the size and
complexity of the compiler will possibly grow.
Even when speed of compiled code is not an issue, open coding these idioms
will help in a few cases where a code object would then not have to be
written to a file, but will not help when open coding in not desired by
the programmer either for reasons of space efficiency or modularity.
Once it is possible for the rest of the Lisp system to handle the
internal form of macros, it is time to turn our attention to
the tediousness of the syntax.
A proposed solution to this aspect of the macro problem is presented in
Eugene Kohlbecker's PhD thesis
\ref:Kohlbecker.1986., in which the author asserts that macros can be written
in a style that is like the current Common Lisp style, but free function
position variables can be defaultly taken to be the globally defined
function of the same name.
Will Clinger proposed the following sketch for an algorithm for being able
to write macros in current Common Lisp style where free variables inserted
by the macros have their macro-expansion-time values substituted.
The algorithm is presented in terms of a hypothetical \smcp{MACRO-EXPAND},
corresponding to the \smcp{MACROEXPAND} function in Common Lisp. A
companion \smcp{MACRO-EXPAND1} procedure will be assumed corresponding to
the \smcp{MACROEXPAND1}. For simplicity, the algorithm is presented as
though all macros are global; \lispone/ is assumed also.
If $V$ is a variable and $M$ and $N$ are expressions, then the notation
$M[N/V]$ denotes the expression obtained by substituting $N$ for all free
references to $V$ in $M$. Free occurrences of $V$ on the left hand side
of an assignment are {\it not} substituted; this use of the notation is
therefore nonstandard. The notation $M[N↓1/V↓1]\ldots[N↓k/V↓k]$ indicates
cascaded substitutions from left to right.
\smcp{MACRO-EXPAND} takes an expression as its argument and returns two
values: a completely macro-expanded expression $X$ and a set $F$ of
variables free in $X$. (In general $F$ will be a subset of the variables
free in $X$, but it will usually contain most of the free variables.)
\smcp{MACRO-EXPAND} works by case analysis on its argument. There are
three broad cases:
Case 1. The argument is a binding form. \smcp{MACRO-EXPAND} returns
\numitem{a.}the result of substituting for each subexpression the result
of calling itself recursively on the subexpression
\numitem{b.}the union of the sets of free variables computed for each
subexpression, minus the set of variables bound by the binding form.
Case 2. The argument is neither a binding form nor a macro call.
\smcp{MACRO-EXPAND} returns
\numitem{a.}the result of substituting for each subexpression the result
of calling itself recursively on the subexpression
\numitem{b.}the union of the sets of free variables computed for each
subexpression.
Case 3. The argument is a macro call $M$. Let $A$ be the set of symbols
that appear in $M$. Let $N$ be the set of names of all special forms and
macros that are in effect for $M$. (Since we are for the moment assuming
that all macros are global, $N$ is the set of names of all special forms
and macros that have been defined.) Let $X$ be the completely expanded
macro obtained by calling \smcp{MACRO-EXPAND} recursively on the result of
\fix{(MACRO-EXPAND1~X)}. Let $F$ be the set of free variables returned by
this recursive call to \smcp{MACRO-EXPAND}. Let $\{V↓1,\dots V↓n\}=F-A-N$.
Then \smcp{MACRO-EXPAND} returns
\numitem{a.}the form
$$X[(\hbox{QUOTE}\quad\hbox{VALUE}↓1)/V↓1]\ldots[(\hbox{QUOTE}\quad
\hbox{VALUE}↓n)/V↓n]$$
\numitem{}where $\hbox{VALUE}↓1,\ldots,\hbox{VALUE}↓n$ are the values of
$V↓1,\ldots,V↓n$ at macro expansion time, using the proper namespace.
\numitem{b.}the empty set.
This last case is understood with an example. Suppose that the macro is
defined as
\begintt
(DEFMACRO FIRST (LIST) `(CAR ,LIST))
\endtt
\noindent and the macro form is
\begintt
(FIRST X)
\endtt
\noindent At the beginning of case 3, $A=\{\hbox{FIRST},\hbox{LIST}\}$ and
$N=\{\hbox{FIRST}\}$
\noindent \smcp{MACRO-EXPAND1} produces
\begintt
(CAR X)
\endtt
\noindent and $F=\{\hbox{CAR},\hbox{Y}\}$. Finally,
$F-A-N=\{\hbox{CAR}\}$. The final result is
\begintt
('#<COMPILED-FUNCTION CAR> X)
\endtt
With this algorithm it can be seen that macros are written in the familiar
style and substitution for free variables is performed automatically. The
problem, though, is that all free variables are not to be treated alike.
Consider this macro
\begintt
(DEFMACRO HACK-PRINT-BASE-WRT-Q (Q)
`(HACK-MIGHTILY *PRINT-BASE* ,Q))
\endtt
\noindent The runtime value of \smcp{*PRINT-BASE*} probably is meant here. The
algorithm cannot intuit the intention of the programmer, so linguistic
means of indicating this case---or other cases---is needed.
Another problem with an automatic solution like this is that some
parameter names are inadvertently the same as free variables introduced by
the macro. For example
\begintt
(DEFMACRO HEADLIGHTS (CAR)
`(CAR ,CAR))
\endtt
\noindent Here Clinger's algorithm will not substitute for the occurrence
of \smcp{CAR} in the functional position because \smcp{CAR} will appear in
the set $A$.
Clinger's algorithm is substituting the values of free variables for
occurrences of those free variables. There are three possible times at
which the value of a free variable inserted by a macro call could be
obtained: macro definition time, macro expansion time, and run time.
Only one of them can be the default; macros that need to obtain the value
at another time will require the programmer to give explicit instructions.
In Common Lisp as now defined the default is run time. This above
algorithm changes the default to macro expansion time.
There is no perfectly reliable way, either in Common Lisp as now defined
or using the proposal outlined here, to write a macro that refers to the
run-time value of a free variable or assigns to such a variable. The
problem is that the macro must insert a free occurrence of the variable,
either as a reference or in an assignment, but has no way to protect
itself from being used in a context where the variable is bound locally
for some irrelevant purpose.
Finally, the run time of this macro expansion algorithm is very much
slower than the current Common Lisp macro expansion algorithm.
These considerations lead us to the conclusion that automatic solutions to
the macro problem need to be augmented with richer macro language in which
the programmer can state concisely and precisely how the macro should be
expanded and to what variables free variables should refer. We are led,
then, to approach~4; approach~4 involves, possibly, a complex set of
aspects.
One aspect to the solution might include a new declaration form, which declares
function definitions constant so that Common Lisp implementations could
declare all Common Lisp functions constant. The correct behavior of
\lispone/ with the built-in functions declared constant would be to signal
an error when cases like \smcp{TEST-CAR} above occur. However, functions
without this declaration would not receive the benefit of this check and
macros that need to use them would continue to be error-prone. Also, it
would be necessary for users to know which names in the language were
constant even if they did not plan to use them for their intended use. To
such users, these names would appear to be holes in the space of available
names. For example, an automobile manufacturer would likely prefer to use
the name \smcp{FIRST} rather than \smcp{CAR} to access the first element
of a list exactly because the manufacturer would want to recycle the name
\smcp{CAR} for some other purpose. Packages can be used to alleviate some
of these problems, but many view packages as fairly clumsy in their own
right.
Another aspect to the solution involves how to precisely name the
environment in which to find free variable references. One solution to
the problem of referring to run time values of variables is to introduce
some kind of absolute path to the variable, perhaps using first-class
environments rooted in a global lexical environment. With such
first-class environments a special form, perhaps named \smcp{ACCESS}
following MIT Scheme, would be used to access the value of a variable in a
specified environment.
Short of this, some linguistic addition to the macro definition language
is needed so that the programmer can express his or her intentions to the
macro expander.
\vfill\eject
\noheadline
Kohlbecker's algorithm does the right thing after all. Rather, one of
Kohlbecker's algorithms does the right thing with free variables---the
algorithm that he describes in lavish detail doesn't, and the change
that makes it work with free variables is mentioned as though it were
a minor implementation detail.
I took Gerry's advice and moved the stuff on how to make it possible for
programmers to write macros manually before the stuff on how to make it
automatic.
I'm much, much happier with this version than with the one I wrote.
Peace, Will
----------------------------------------------------------------
\appendix{A.}{Possible Solutions to the Macro Name Collision Problem}
With the problems with macros described earlier, one would hope that there
could be some solutions to these problems if Common Lisp were to
become a \lispone/ dialect.
There are four basic approaches to the macro name collision problem.
\numitem{1.}Do nothing.
\numitem{2.}Eliminate the Lisp features that contribute to create the
problem.
\numitem{3.}Provide linguistic support so the programmer can express
concisely what is meant.
\numitem{4.}Devise a macro definition facility and macro expansion
algorithm that enable the programmer to write macros in the current
Common Lisp style with most of the name collision problems solved in
the `obvious' way.
In Common Lisp today, macros are implemented as if the approach to the
macro problem were approach~1. Although all of the problems of macros
explained in this paper exist in \lisptwo/, programmers routinely and
frequently write powerful macros. In \lispone/ the potential for stumbling
into an error is greater than in a \lisptwo/, but programmers can learn to
work around the problems relatively easily.
In Common Lisp approach~2 can be implemented by removing
\smcp{FLET}, \smcp{LABELS}, and \smcp{MACROLET}. In \lispone/ approach~2
might require eliminating macros and using functions as the main
abstraction mechanism rather than macros. The compiler can be directed to
inline code the functions that form the basis for abstraction, gaining
speed. The only aspect of macros that are not as easily modelled this way
is the optimization mentioned earlier in the section called {\it The Benefits
of Macros}.
Approach~3 requires the macro writer to write macro definitions in such
a way that they do not insert new free variables. Instead of writing a
macro so that it inserts a free variable \fix{X}, the programmer would
write the macro so that the macro expansion time value of \fix{X} appears
as a quoted constant wherever \fix{X} would otherwise have been inserted.
For example, recall the following code for a \lispone/ dialect:
\begintt
(DEFMACRO FIRST (LIST) `(CAR ,LIST))
(DEFMACRO REST (LIST) `(CDR ,LIST))
(DEFUN TEST-CAR (CAR TEST-LIST)
"The `driver' program for Frobozz Automobile, Inc.'s
quality assurance test."
(DO ((TESTS TEST-LIST (REST TESTS)))
((NULL TESTS))
((FIRST TESTS) CAR)))
\endtt
\noindent This code suffers from the insertion of the free variable
\smcp{CAR} into a context in which \smcp{CAR} is bound.
In some Scheme implementations it is possible to write the following
to solve this problem:
\begintt
(DEFMACRO FIRST (LIST) `(',CAR ,LIST))
\endtt
\noindent This is syntactically more tedious but has the obvious advantage
that there is no potential for name confusion. In this macro, the value
of \smcp{CAR} is inserted into the resulting macro expanded form at macro
expansion time. The value is, presumably, the function that implements
\smcp{CAR}. Nevertheless, something like this doesn't instantly answer
all technical problems. When uses of this macro are compiled to a file,
one of two things must be output to the file and later read back in:
\numitem{a.}an object such as \smcp{\#$\langle$COMPILED-FUNCTION
CAR$\rangle$}, which is the name of a compiled code object
\numitem{b.}a representation of the compiled code object for \smcp{CAR}.
Lisp and Scheme implementations that can output only names of compiled
code objects will need to produce named functions along with references to
them in compiled code files for those functions that are incidentally
defined in macro definitions. For example, consider the following, more
complex macro definition and use:
\begintt
(DEFMACRO FOO (N X) `(',(LAMBDA (X) (+ X N)) ,X))
(FOO 3 Z)
\endtt
\noindent The incidental function \fix{(LAMBDA (X) (+ X N))} will need a
name.
For Lisp and Scheme implementations that output representations of code
objects into files, there is an additional problem, which is that for each
use of a macro like \smcp{FOO} there may be a separate copy of the
compiled code for the anonymous function in the macro definition. Also,
in addition to writing the compiled code to a file, all procedures
necessary to support it must also be correctly saved and restored by the
file compiler.
One problem with outputting code objects to a file is that there is a
possibility of having multiple copies of the same code object in the Lisp
image when the file is loaded. Some systems are able to collapse
structurally equal compiled-code objects into only one copy when the
occurrences of them are all in a single file, but this requires some work
and does not solve the problem of the same code object appearing in
multiple files.
When the speed of compiled code is important, programmers will wish their
compiler to open-code invocations of Lisp primitives like \smcp{CAR}.
Usually compilers notice such occurrences of \smcp{CAR} by recognizing
occurrences of the symbol \smcp{CAR}. When macros indicate the invocation
of primitives like \smcp{CAR} and the approach being discussed is in
force, compilers will need to notice occurrences of \smcp{CAR} by
recognizing occurrences of the compiled code object that implements
\smcp{CAR}. For instance, in the example above, the compiler ought to
open-code the original \smcp{CAR} operation in the \smcp{TEST-CAR} code
rather than to code a function call to \smcp{CAR}. To do this, the
compiler must be made to understand a new set of idioms, and the size and
complexity of the compiler will possibly grow.
Even when speed of compiled code is not an issue, open coding these idioms
will help in a few cases where a code object would then not have to be
written to a file, but will not help when open coding is not desired by
the programmer either for reasons of space efficiency or modularity.
Another kind of linguistic support that would help to make approach~3 work
is a new declaration form for the purpose of declaring function definitions
to be pervasive constants. Common Lisp implementations could then
declare all Common Lisp functions constant. The correct behavior of
\lispone/ with the built-in functions declared constant would be to signal
an error when cases like \smcp{TEST-CAR} above occur. However, functions
without this declaration would not receive the benefit of this check and
macros that need to use them would continue to be error-prone. Also, it
would be necessary for users to know which names in the language were
constant even if they did not plan to use them for their intended use. To
such users, these names would appear to be holes in the space of available
names. For example, an automobile manufacturer would likely prefer to use
the name \smcp{FIRST} rather than \smcp{CAR} to access the first element
of a list exactly because the manufacturer would want to recycle the name
\smcp{CAR} for some other purpose. Packages can be used to alleviate some
of these problems, but many view packages as fairly clumsy in their own
right. If on the other hand the semantics were such that
constant declarations apply only to the scope of the constant variable,
then the constant declarations would not help the macro writer very much.
It appears that Common Lisp today offers no perfectly reliable way for
a macro to access a global variable. In a \lispone/ dialect one could
write
\begintt
(DEFMACRO HEAD-OF-QUEUE () `(',CAR (',SYMBOL-VALUE '*QUEUE*)))
\endtt
\noindent but the current \lisptwo/ dialect requires \smcp{FUNCALL} to be
used here, and there is no way to use the macro expansion time value of
\smcp{FUNCALL} without relying on the run time value of \smcp{FUNCALL} as
well, unless the code using this macro is compiled to a file and the
compiler writes out the code object for \smcp{FUNCALL} or unless the
compiler can open code the use of \smcp{FUNCALL}, as discussed above. One
solution, of course, is to declare \smcp{FUNCALL} to be a constant using
the mechanism proposed above. Another solution is to extend the \lisptwo/
dialect to use \lispone/ evaluation rules when the function position of a
function call is neither a symbol nor a lambda expression.
In general, approach~3 requires that the programmer be given a precise
way to name the environment in which to find free variable references.
One possible solution is to introduce some kind of absolute path to
variables, perhaps using first-class environments rooted in a global
lexical environment. With such first-class environments a special form,
perhaps named \smcp{ACCESS} following MIT Scheme, would be used to access
the value of a variable in a specified environment. This approach would
avoid most of the problems associated with quoted procedure values.
Once it is possible for the rest of the Lisp system to handle the
internal form of macros, it is time to turn our attention to
the tediousness of the syntax and ways in which it can be relieved
through automatic means.
Approach~4 can be based on a macro expansion
algorithm presented in Eugene Kohlbecker's PhD thesis
\ref:Kohlbecker.1986.. This algorithm was
designed to solve a different problem that occurs when bound variables
inserted by a macro inadvertantly capture variables of the same name
that appear in the macro call, but one of the variations
mentioned by Kohlbecker solves
the problem of free variables inserted by a macro as well.
The algorithm takes an expression as input and returns a
completely macro-expanded expression. The algorithm operates in three
major phases. In the first phase, it simply places a {\it time-stamp}
of $0$ on all symbols that appear in the expression; the time-stamped symbol
$V$ will be written as $V:0$ in the examples.
The time-stamping will make it possible
to distinguish between free variables that were inserted by macro
expansion and free variables that were already present in the macro
call. All symbols must be time-stamped, whether free or bound,
whether they are variables or merely components of a data structure,
because it is impossible to tell the difference before macro expansion.
Following this time-stamping, the time-stamp counter is set to $1$.
The time-stamping is implemented by substituting a \smcp{GENSYM} for
each symbol, taking care that all occurrences of a
symbol are replaced by the same \smcp{GENSYM}. Both the original symbol and
the time-stamp can be stored on the property list of the \smcp{GENSYM}.
What is important is that time-stamped symbols be distinguishable from
symbols that have not been time-stamped, that the time-stamp of a
time-stamped symbol can be determined, and that the original symbol
can be recovered from the time-stamped symbol.
The second phase of the algorithm
is presented in terms of a hypothetical \smcp{MACRO-EXPAND}, which roughly
corresponds to the \smcp{MACROEXPAND} function in Common Lisp. A
companion \smcp{MACRO-EXPAND1} procedure will be assumed corresponding to
the \smcp{MACROEXPAND1}. For simplicity, the algorithm is presented as
though all macros are global; \lispone/ is assumed also.
\smcp{MACRO-EXPAND} works by case analysis on its argument.
There are two broad cases:
Case 1. The argument is not a macro call. \smcp{MACRO-EXPAND}
returns the result of substituting for each subexpression the
result of calling itself recursively on the subexpression.
Case 2. The argument is a macro call $M$. \smcp{MACRO-EXPAND}
calls \smcp{MACRO-EXPAND1} to obtain the result $X$ of expanding the
top-level macro call and then time-stamps all unstamped symbols in
$X$ with the current value of the time-stamp counter. It then
increments the time-stamp counter before calling itself
recursively on the time-stamped version of $X$.
The result of \smcp{MACRO-EXPAND} is a completely macro-expanded
expression in which
\numitem{a.}all symbols are time-stamped.
\numitem{b.}free variables derived from the original unexpanded
expression have a time-stamp of $0$.
\numitem{c.}free variables that have been inserted by a macro have
time-stamps greater than $0$.
\numitem{d.}bound variables that were inserted when a macro call
was expanded have time-stamps greater than the time-stamps of
all variables that appeared in the unexpanded macro call.
The third phase of the algorithm simply traverses the result of the
second phase and:
\numitem{a.}unstamps all symbols that name special forms or appear
in constant (quoted) structure.
\numitem{b.}unstamps all free variables whose time stamp is $0$.
\numitem{c.}unstamps all free variables whose time stamp is greater
than $0$.
\numitem{d.}leaves bound variables alone.
As presented above, the modified algorithm arranges for free variables
inserted by a macro to be evaluated at run time in the global environment.
However, the algorithm is fairly independent of the strategy chosen to
deal with such variables; for example, case~(c) of the third
phase could be replaced by
\numitem{c'.}unstamps all occurrences of free variables whose time
stamp is greater than $0$ and appear on the left hand side of a
\fix{SETQ}; replaces all other references to free variables whose time
stamp is greater than $0$ with \fix{(QUOTE VALUE)}, where \fix{VALUE}
is the value of the unstamped variable at macro expansion time.
Two examples show how this algorithm allows macros to be written in the
familiar style, while any necessary substitution for free variables is
performed automatically. Suppose a macro is defined as
\begintt
(DEFMACRO FIRST (LIST) `(CAR ,LIST))
\endtt
\noindent and the input expression is
\begintt
(LET ((CAR '(A B C))) (FIRST CAR))
\endtt
The output of the time-stamping phase is
\begintt
(LET:0 ((CAR:0 '(A:0 B:0 C:0))) (FIRST:0 CAR:0))
\endtt
\noindent and the result of the second phase is something like
\begintt
((LAMBDA:0 (CAR:0) (CAR:1 CAR:0)) '(A:0 B:0 C:0))
\endtt
\noindent so the result of the third phase is either
\begintt
((LAMBDA (CAR:0) (CAR CAR:0)) '(A B C))
\endtt
\noindent or
\begintt
((LAMBDA (CAR:0) ('#<COMPILED-FUNCTION CAR> CAR:0)) '(A B C))
\endtt
\noindent depending on whether the third phase uses case~(c) or~(c').
\fix{CAR:0} is a \smcp{GENSYM}, so either way should work.
Kohlbecker's algorithm eliminates another common source of
bugs in macros. Suppose that a restricted form of the OR macro were
defined via
\begintt
(DEFMACRO OR2 (X Y)
`(LET ((V ,X))
(IF V V ,Y)))
\endtt
\noindent Then the expression
\begintt
(OR2 NIL V)
\endtt
\noindent would actually work because it would expand into something like
\begintt
((LAMBDA (V:1) (IF V:1 V:1 V)) NIL)
\endtt
\noindent where V:1 is a \smcp{GENSYM}.
The following example suggests that case~(c) should be used in the
third phase rather than case~(c'). Consider the macro
\begintt
(DEFMACRO HACK-PRINT-BASE-WRT-Q (Q)
`(HACK-MIGHTILY *PRINT-BASE* ,Q))
\endtt
\noindent The runtime value of \smcp{*PRINT-BASE*} probably is meant here.
The algorithm cannot intuit the intention of the programmer, so linguistic
means of indicating this case---or other cases---would be needed if case
(c') were used.
There are three possible times at
which the value of a free variable inserted by a macro call could be
obtained: macro definition time, macro expansion time, and run time.
Only one of them can be the default; macros that need to obtain the value
at another time will require the programmer to give explicit instructions.
In Common Lisp as now defined the default is run time. The above algorithm
also defaults to run time if case (c) is used in the third phase. If case~(c')
is used, the default is macro expansion time except on the left hand
side of an assignment where macro expansion time doesn't make sense.
One disadvantage of the
macro expansion algorithm is that it would require macros to call
an \smcp{UNSTAMP} function explicitly in some cases. For example,
\fix{(EQ (CAR CLAUSE) 'OTHERWISE)} would need to be written as
\fix{(EQ (UNSTAMP (CAR CLAUSE)) 'OTHERWISE)}.
Finally, this macro expansion algorithm is very much
slower than the current Common Lisp macro expansion algorithm.
It is straightforward, however, to combine the second and third phases
into a single pass, and the algorithm can be modified so that the
time-stamping that occurs in phase~2 can be omitted when expanding a
standard macro that is known not to insert free identifiers.
\vfill\eject
\noheadline